There are three types of data sets in this project: + Bird count data + Point data for each location + Distance adjusted values
Here are the current categories in the three data set types.
Bird count data:
| Column ID | Description |
|---|---|
| BBA_ID | Unique data collection point |
| SP_code | Four-letter species code |
| X1.75 | Counts of song within 75 meters, trial 1 |
| X1.75.1 | Counts of song outside 75 meters, trial 1 |
| X2.75 | Counts of song within 75 meters, trial 2 |
| X2.75.1 | Counts of song outside 75 meters, trial 2 |
| X3.75 | Counts of song within 75 meters, trial 3 |
| X3.75.1 | Counts of song outside 75 meters, trial 3 |
| X4.75 | Counts of song within 75 meters, trial 4 |
| X4.75.1 | Counts of song outside 75 meters, trial 4 |
| X5.75 | Counts of song within 75 meters, trial 5 |
| X5.75.1 | Counts of song outside 75 meters, trial 5 |
| Non.songCue.75 | Counts of signs other than song inside 75 meters |
| Non.songCue.75.1 | Counts of signs other than song outside 75 meters |
| Fly.Over | Counts of birds that flew over the collection point |
Point Data:
| Column ID | Description |
|---|---|
| BBA_ID | Unique data collection point |
| GPS_N | Northing in decimal degrees |
| GPS_W | Easting in decimal degrees |
| Observer | four letter observer code |
| obscode | Observer number |
| Date | Date of collection |
| Start | Start time |
| Temp | Temperature at time of collection |
| Q1 | Wind speed factor |
| Q2 | Precipitation factor |
| Q3 | Cloud cover factor |
| Q4 | Rank of hemlock abundance |
| Q5 | Rank of hemlock health |
| Q6 | Rank of spruce abundance |
| Q7 | Spruce qualifier |
| Q8 | Extent of forest understory |
| Q9 | Degree of deer browse impact on the understory |
| Q10 | Evidence of active land use change |
| Q11 | Pastured livestock or barn present? |
| Q12 | Rank of natural nesting cavity presence |
| Q13 | Rank of nest box presence |
| Q14 | Road type |
| Q15 | Dominant habitat within 75 meters |
Distance adjusted data: + Note: there are a few differences between each distance adjusted file
| Column ID | Description |
|---|---|
| X (not always present) | Not sure what this means |
| LID | Point count location ID |
| TYPE40 | Proportion of landscape composed of All Forest |
| PD40 | Patchiness of All Forest |
| CL40 | Clumpiness of All Forest |
| TYPE41 | Proportion of landscape composed of Deciduous Forest |
| PD41 | Patchiness of Deciduous Forest |
| CL41 | Clumpiness of Deciduous Forest |
| TYPE42 | Proportion of landscape composed of Evergreen Forest |
| PD42 | Patchiness of Evergreen Forest |
| CL42 | Clumpiness of Evergreen Forest |
| TYPE43 | Proportion of landscape composed of Mixed Forest |
| PD43 | Patchiness of Mixed Forest |
| CL43 | Clumpiness of Mixed Forest |
| TYPE80 | Proportion of landscape composed of Agricultural Land |
| obscode | Point count observer code |
| Dia | Julian date of point count |
| Ano | Year of point count |
| Start | Time point count began |
| LULC | Observer classification of land cover change at point count location |
| habitat | Observer classification of dominant habitat type at point count location |
| MQCOMP1 | First component of a PCA including the four Developed land cover classes, area-weighted average population density, and area-weighted average housing density |
| MQCOMP2 | First component of a PCA including the four Developed land cover classes, area-weighted average population density, and area-weighted average housing density |
| SHDI | Shannon Diversity Index of land cover classes |
| ELEV_MEAN | Mean elevation in landscape |
| ELEV_RANGE | Difference between highest and lowest elevation in landscape |
Since we need to combine these data sets, we need to know if they have the same values for at least one column. The only one that is in all the sets that is unique to the data entry is the ID code for the point + If the point codes do not match each other, we cannot combine them easily. + Easiest way: look for any that do not match when comparing values (logical operators)
sum(!(df_adj200$LID %in% df$BBA_ID))
## [1] 0
sum(!(df$BBA_ID %in% df_adj200$LID))
## [1] 269409
sum(!(df_adj1$LID %in% df$BBA_ID))
## [1] 142
sum(!(df$BBA_ID %in% df_adj1))
## [1] 500778
sum(!(df$BBA_ID %in% df_point$BBA_ID))
## [1] 0
sum(!(df_point$BBA_ID %in% df$BBA_ID))
## [1] 0
When comparing between the bird count data and the distance adjusted data for two distance data sets, there were differences between IDs. However, there were no entry differences when comparing the bird count and point data.
It is possible that the distance adjusted IDs only cover a few of the points, so it might help to join the adjusted distance data sets together. + Still want to keep the distance adjustment though (units in km)
df_adj1$dist <- 1
df_adj2$dist <- 2
df_adj4$dist <- 4
df_adj6$dist <- 6
df_adj8$dist <- 8
df_adj10$dist <- 10
df_adj12$dist <- 12
df_adj16$dist <- 16
df_adj200$dist <- 0.2
df_adj500$dist <- 0.5
Since some datasets have different columns, we will combine the ones with the same columns, then fix the difference
adj <- bind_rows(df_adj1, df_adj2, df_adj4, df_adj6, df_adj8, df_adj10, df_adj12, df_adj16)
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
adj2 <- bind_rows(df_adj200, df_adj500)
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
adj <- adj[, -1]
adj2 <- adj2[,-22]
Now, binding and comparing the adjusted sets:
adj_fin <- bind_rows(adj, adj2)
adj_fin$dist <- as.factor(adj_fin$dist)
sum(!(adj_fin$LID %in% df$BBA_ID))
## [1] 10146
sum(!(df$BBA_ID %in% adj_fin$LID))
## [1] 269409
There are still a lot of IDs that are not consistent; we will omit the adjusted data to retain most of the count and point information and to simplify things.
Since the counts do no good when determining the total counts at a certain time, we will add all the values from the count columns and place them into one column.
df$TotalCounts <- rowSums(df[, -c(1, 2)], na.rm = TRUE)
Now, we need to rename the columns Q1 - Q15 so they make sense. Afterward, we will make the names easier to work with.
names(df_point) <- c("BBA_ID" , "GPS_N", "GPS_W", "Observer", "obscode", "Date", "Start", "Temp","wind_speed", "precip", "cloud_cover", "hemlock_abundance", "hemlock_health", "spruce_abundance", "spruce_qual", "understory", "deer_impact_understory", "land_use", "livestock", "nest_cav", "nest_box", "road_type", "habitat")
names(df) <- str_replace(names(df), pattern = "\\.", replacement = "_")
names(df) <- str_replace(names(df), pattern = "X", replacement = "Trial")
names(df) <- str_replace(names(df), pattern = "75.1", replacement = "out")
names(df) <- str_replace(names(df), pattern = "75", replacement = "in")
Now we need to join the point data and the bird count data with full_join() and tidy the data set. + Because of length, this will not be included in the visible code.
Here is a list of the changes made to the data: + Joined bird count and point data + Added month and POSIX column for different date and time options + Changed the format of the Date column + Condensed the tallies for bird counts into “In75m,”Out75m“, and”TotalCounts“, with the total counts column being”In" and “Out” added + Got rid of NA values
Since there will be some variation between different kinds of birds, and since it is hard to put all the different species code in a facet-wrapped chart (among other things), I decided to add a species type column, using the species code and grouping them by similarity.
Note: the dataframe name changed from df to count1 in the cleaning process. Sorry if this causes any inconvenience
count1duck <- count1[count1$SP_code %in% c("ABDU", "AMCO", "AMWI", "BUFF", "BWTE", "CAGO", "CANV", "COLO", "COME", "GADW", "GWTE", "HOME", "LESC", "MALL", "MUSW", "NOPI", "PBGR", "RBME", "RNDU", "RUDU", "TRUS", "WODU"),]
count1wade <- count1[count1$SP_code %in% c("AMBI", "BCNH", "BEKI", "CAEG", "COMO", "GBHE", "GLIB", "GREG", "GRHE", "KIRA", "LEBI", "SACR", "SNEG", "SORA", "VIRA", "WISN", "YCNH", "YEAR"),]
count1pass <- count1[count1$SP_code %in% c("ACFL", "ALFL", "AMCR", "AMGO", "AMRE", "AMRO", "BANS", "BAOR", "BARS", "BAWW", "BBCU", "BCCH", "BGGN", "BHCO", "BHVI", "BLBW", "BLGR", "BLJA", "BLPW", "BOBO", "BRCR", "BRTH", "BRWA", "BTBW", "BTNW", "BWWA", "CACH", "CARW", "CAWA", "CCSP", "CEDW", "CERW", "CHSP", "CLSW", "COGR", "CORA", "COYE", "CSWA", "DEJU", "DICK", "EABL", "EAKI", "EAME", "EAPH", "EATO", "EAWP", "EUST", "FICR", "FISP", "GCFL", "GCKI", "GRCA", "GRSP", "GWWA", "HESP", "HETH", "HOFI", "HOLA", "HOSP", "HOWA", "HOWR", "INBU", "KEWA", "LASP", "LAWA", "LEFL", "LOSH", "LOWA", "MAWA", "MAWR", "MOWA", "NAWA", "NOBO", "NOCA", "NOMO", "NOPA", "NOWA", "NRWS", "OROR", "OSFL", "OVEN", "PISI", "PIWA", "PRAW", "PROW", "PUFI", "PUMA", "RBGR", "RBNU", "RCKI", "RECR", "REVI", "RWBL", "SAVS", "SCTA", "SEWR", "SOSP", "SUTA", "SWSP", "SWTH", "SWWA", "TRES", "TUTI", "VEER", "VESP", "WAVI", "WBNU", "WEME", "WEVI", "WEWA", "WIFL", "WIWR", "WOTH", "WTSP", "WWCR", "YBCH", "YBCU", "YBFL", "YRWA", "YTVI", "YTWA", "YWAR"),]
count1rapt <- count1[count1$SP_code %in% c("AMKE", "BADO", "BAEA", "BANO", "BLVU", "BWHA", "CLSW", "COHA", "EASO", "GHOW", "LEOW", "MERL", "NOGO", "NOHA", "NSWO", "OSPR", "PEFA", "RSHA", "RTHA", "SEOW", "SSHA", "TUVU"),]
count1shor <- count1[count1$SP_code %in% c("AMWO", "KILL", "SPSA", "UPSA"),]
count1tern <- count1[count1$SP_code %in% c("BLTE", "BOGU", "CATE", "COTE", "DCCO", "FOTE", "GBBG", "HEGU", "LAGU", "LBBG", "LETE", "RBGU", "RHWO"),]
count1game <- count1[count1$SP_code %in% c("CHUK", "RNPH", "RUGR", "WITU"),]
count1goat <- count1[count1$SP_code %in% c("CONI", "CWWI", "WPWI"),]
count1peck <- count1[count1$SP_code %in% c("DOWO", "HAWO", "NOFL", "PIWO", "RBWO", "YBSA"),]
count1dove <- count1[count1$SP_code %in% c("EUCD", "MODO", "ROPI"),]
count1other <- count1[count1$SP_code %in% c("CHSW", "RTHU"),]
count1unk <- count1[count1$SP_code %in% c("XXXX", "ZZZZ"),]
When doing this, I noticed that there were about 3000 rows of data missing, so I investigated a little… turns out there were some species codes that were not in the key, which is a problem. I promptly looked up the four-letter species codes for these species and added them into their respective categories, then redid the assigning.
note: Some species names were listed as two different forms of species codes, so I had to change them to the entry that was on the key.
count1$SP_code <- as.character(count1$SP_code)
count1$SP_code[count1$SP_code == "RODO"] <- "ROPI"
count1$SP_code[count1$SP_code == "TRFL"] <- "WIFL"
count1$SP_code[count1$SP_code == "BDOW"] <- "BADO"
count1$SP_code[count1$SP_code == "ESOW"] <- "EASO"
count1$SP_code[count1$SP_code == "ECDO"] <- "EUCD"
count1$SP_code <- as.factor(count1$SP_code)
count1duck <- count1[count1$SP_code %in% c("SNGO","ABDU", "AMCO", "AMWI", "BUFF", "BWTE", "CAGO", "CANV", "COLO", "COME", "GADW", "GWTE", "HOME", "LESC", "MALL", "MUSW", "NOPI", "PBGR", "RBME", "RNDU", "RUDU", "TRUS", "WODU"),]
count1wade <- count1[count1$SP_code %in% c("AMBI", "BCNH", "BEKI", "CAEG", "COMO", "GBHE", "GLIB", "GREG", "GRHE", "KIRA", "LEBI", "SACR", "SNEG", "SORA", "VIRA", "WISN", "YCNH", "YEAR"),]
count1pass <- count1[count1$SP_code %in% c("CMWA", "WCSP", "HYCH", "ACFL", "ALFL", "AMCR", "AMGO", "AMRE", "AMRO", "BANS", "BAOR", "BARS", "BAWW", "BBCU", "BCCH", "BGGN", "BHCO", "BHVI", "BLBW", "BLGR", "BLJA", "BLPW", "BOBO", "BRCR", "BRTH", "BRWA", "BTBW", "BTNW", "BWWA", "CACH", "CARW", "CAWA", "CCSP", "CEDW", "CERW", "CHSP", "CLSW", "COGR", "CORA", "COYE", "CSWA", "DEJU", "DICK", "EABL", "EAKI", "EAME", "EAPH", "EATO", "EAWP", "EUST", "FICR", "FISP", "GCFL", "GCKI", "GRCA", "GRSP", "GWWA", "HESP", "HETH", "HOFI", "HOLA", "HOSP", "HOWA", "HOWR", "INBU", "KEWA", "LASP", "LAWA", "LEFL", "LOSH", "LOWA", "MAWA", "MAWR", "MOWA", "NAWA", "NOBO", "NOCA", "NOMO", "NOPA", "NOWA", "NRWS", "OROR", "OSFL", "OVEN", "PISI", "PIWA", "PRAW", "PROW", "PUFI", "PUMA", "RBGR", "RBNU", "RCKI", "RECR", "REVI", "RWBL", "SAVS", "SCTA", "SEWR", "SOSP", "SUTA", "SWSP", "SWTH", "SWWA", "TRES", "TUTI", "VEER", "VESP", "WAVI", "WBNU", "WEME", "WEVI", "WEWA", "WIFL", "WIWR", "WOTH", "WTSP", "WWCR", "YBCH", "YBCU", "YBFL", "YRWA", "YTVI", "YTWA", "YWAR"),]
count1rapt <- count1[count1$SP_code %in% c("AMKE", "BADO", "BAEA", "BANO", "BLVU", "BWHA", "CLSW", "COHA", "EASO", "GHOW", "LEOW", "MERL", "NOGO", "NOHA", "NSWO", "OSPR", "PEFA", "RSHA", "RTHA", "SEOW", "SSHA", "TUVU"),]
count1shor <- count1[count1$SP_code %in% c("SBDO","LEYE", "AMWO", "KILL", "SPSA", "UPSA"),]
count1tern <- count1[count1$SP_code %in% c("BLTE", "BOGU", "CATE", "COTE", "DCCO", "FOTE", "GBBG", "HEGU", "LAGU", "LBBG", "LETE", "RBGU", "RHWO"),]
count1game <- count1[count1$SP_code %in% c("CHUK", "RNPH", "RUGR", "WITU"),]
count1goat <- count1[count1$SP_code %in% c("CONI", "CWWI", "WPWI"),]
count1peck <- count1[count1$SP_code %in% c("DOWO", "HAWO", "NOFL", "PIWO", "RBWO", "YBSA"),]
count1dove <- count1[count1$SP_code %in% c("EUCD", "MODO", "ROPI"),]
count1other <- count1[count1$SP_code %in% c("CHSW", "RTHU"),]
count1unk <- count1[count1$SP_code %in% c("XXXX", "MUSC", "ZZZZ"),]
Now I will add species type and join the datasets back together.
count1duck$species_type <- "Waterfowl"
count1wade$species_type <- "Wading_bird"
count1pass$species_type <- "Passerine"
count1rapt$species_type <- "Raptor"
count1shor$species_type <- "Shorebird"
count1tern$species_type <- "Larid"
count1game$species_type <- "Galliform"
count1goat$species_type <- "Goatsucker"
count1peck$species_type <- "Woodpecker"
count1dove$species_type <- "Dove"
count1other$species_type <- "Other"
count1unk$species_type <- "Unknown"
count2 <- rbind(count1dove, count1duck, count1game, count1goat, count1other, count1pass, count1peck, count1rapt, count1shor, count1tern, count1unk, count1wade)
I somehow ended up with duplicate entries… I will need to fix that.
Process: + find duplicated entries + omit doubles
count2 <- count2[!(duplicated(count2$number)),]
#####Exploring and Modeling the Data I am going to reload the data set from a saved file; the name will be df again.
Now, we can answer some questions about the data. The first question: How many entries were there for each bird (or bird type)?